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Abstract 


For the model of probabilistic labelled transition systems that 
allow for the co-existence of nondeterminism and probabilities, we 
present two notions of bisimulation metrics: one is state-based and the 
other is distribution-based. We provide a sound and complete modal 
characterisation for each of them, using real-valued modal logics based 
on Hennessy-Milner logic. The logic for characterising the state-based 
metric is much simpler than an earlier logic proposed by Desharnais et 
al. as it uses only two non-expansive operators rather than the general 
class of non-expansive operators. For the kernels of the two metrics, 
which correspond to two notions of bisimilarity, we give a comprehensive 
comparison with some typical distribution-based bisimilarities in the 
literature. 


Keywords: Probabilistic labelled transition systems; Behavioral pseu- 
dometrics; Real-valued modal logics 


1 Introduction 


Bisimulation is an important proof technique for establishing behavioural 
equivalences of concurrent systems. In probabilistic concurrency theory, 
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there are roughly two kinds of bisimulations: one is state-based that is 
directly defined over states and then lifted to distributions, and the other is 
distribution-based as it is a relation between distributions. The former is 
originally defined in [37] to represent a branching time semantics; the latter, 
as defined in [31, 24, 14], represents a linear time semantics. 


In correspondence with those bisimulations, there are two notions of 
behavioural pseudometrics (simply called metrics in the current work). They 
are more robust ways of formalising behavioural similarity between formal 
systems than bisimulations because, particularly in the probabilistic setting, 
bisimulations are too sensitive to probabilities (a very small perturbation of 
the probabilities would render two systems non-bisimilar). A metric gives a 
quantitative measure of the distance between two systems and distance 0 
usually means that the two systems are bisimilar. A logical characterisation 
of the state-based bisimulation metric for labelled Markov processes is given 
in [17]. For a more general model of labelled concurrent Markov chains 
(LCMCs) that allow for the co-existence of nondeterminism and probabilities, 
a weak bisimulation metric is proposed in [18]. Its logical characterisation 
uses formulae like ho f, which is the composition of formula f with any 
non-expansive operator h on the interval [0,1], ie. |h(x) — h(y)| < |a—y| 
for any x,y € [0,1]. A natural question then arises: instead of the general 
class of non-expansive operators, is it possible to use only a few simple 
non-expansive operators without losing the capability of characterising the 
bisimulation metric? 


In the current work, we give a positive answer to the above question. 
For simplicity of presentation, we focus on strong bisimulation metrics. But 
the proof idea can be generalised to the weak case. We work in the framework 
of probabilistic labelled transition systems (pLTSs) that are essentially the 
same as LCMCs, so the interplay of nondeterminism and probabilities is 
allowed. We provide a modal characterisation of the state-based bisimulation 
metric closely in line with the classical Hennessy-Milner logic (HML) [30]. 
Our variant of HML makes use of state formulae and distribution formulae, 
which are formulae evaluated at states and distributions, respectively, and 
yield success probabilities. We use merely two non-expansive operators: 
negation (-@) and testing (¢ © p). Negation is self-explanatory and the 
testing operator checks if a state satisfies a property with certain threshold 
probability. More precisely, if state s satisfies formula ¢ with probability q, 
then it satisfies ~@ with probability 1— gq, and satisfies ¢ 9 p with probability 
q—p if q>p and 0 otherwise. In other words, we do not need the general 
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class of non-expansive operators because negation and testing, together with 
other modalities inherited from the classical HML, are expressive enough 
to characterise bisimulation metrics*. As regards to the characterisation 
of distribution-based bisimulation metric, we drop state formulae and use 
distribution formulae only. In addition, we show that the distribution-based 
metric is a lower bound of the state-based metric when the latter is lifted to 
distributions. 

The kernels of the two metrics generate two notions of bisimilarity: 
one is state-based and the other is distribution-based. The state-based 
bisimilarity is widely accepted by the community of probabilistic concurrency 
theory, and it admits elegant characterisations from metric, logical, and 
algorithmic perspectives [11]. On the contrary, there is no general agreement 
on what is a good notion of distribution-based bisimilarity. We compare 
the two bisimilarities induced by our metrics with some typical notions of 
distribution-based bisimilarities proposed in the literature. Our distribution- 
based bisimilarity turns out to coincide with the one defined in [24] and they 
constitute the coarsest bisimilarity for distributions. 

The rest of this paper is organised as follows. Section 2 provides some 
basic concepts on pLTSs. Section 3 defines a two-sorted modal logic that 
leads to a sound and complete characterisation of the state-based bisimulation 
metric. Section 4 gives a similar characterisation for the distribution-based 
bisimulation metric. In Section 5 we compare the two metrics discussed in 
the previous two sections. In Section 6 we compare the two bisimilarities 
generated by the two metrics with some distribution-based bisimilarities 
that appeared in the literature. In Section 7 we review some related work. 
Finally, we conclude in Section 8. 

An extended abstract of this paper has appeared as [19]. All the proofs 
omitted there are now given in great detail. 


2 Preliminaries 


Let S be a countable set. A (discrete) probability subdistribution over S' is 
a function A: S — [0,1] with }),-g A(s) < 1. It is a (full) distribution if 
discs A(s) = 1. Its support, written [A], is defined to be the set {s € S'| 


“Notice that we do not claim that negation and testing operators, plus some constant 
functions, suffice to approximate all the non-expansive operators on the unit interval. That 
claim is too strong to be true. For example, the operator f(x) = 4a cannot be represented 


by those operators. 
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A(s) > O}. Let Dsup(S) (resp. D(S'})) denote the set of all subdistributions 
(resp. distributions) over S. We use € to stand for the empty subdistribution, 
that is €(s) = 0 for any s € S. We write § for the point distribution, satisfying 
s(t) = 1 if t = s, and 0 otherwise. The total mass of subdistribution A, 
written |AJ, is defined as ),-¢ A(s). A weight function® w € D(S x S) for 
(A, ©) € D(S) x D(S) is given if it satisfies the two conditions: )°,.gw(s,t) = 
A(s) and > ,<gw(s,t) = O(t) for all s,t € S. We denote the set of all 
weight functions for (A, 0) by Q(A,@). If {A;}ier is a finite collection of 
subdistributions and {p;}ie7 is a collection of probabilities with }7,-; pi < 1, 
then >7,<; pi: Ai is also a subdistribution with (7,<; pi: Ai)(s) = ier Pi: 
A;(s) for any s € S. 

A metric d over a space § is a distance function d : S x S + Rso 
satisfying: (i) d(s,t) = 0 iff s = t (isolation), (ii) d(s,t) = d(t,s) (symmetry), 
(iii) d(s,t) < d(s,u) + d(u,t) (triangle inequality), for any s,t,u € S. If we 
replace (i) with d(s,s) = 0, we obtain a pseudometric. In this article we 
are interested in pseudometrics because two distinct states can still be at 
distance zero if their behaviour is similar. But for simplicity, we often use 
the term metrics though we really mean pseudometrics. Let c € Ryo be a 
positive real number. A metric d over S is c-bounded if d(s,t) < c for any 
s,t € S. In the rest of this article, we restrict ourselves to 1-bounded metrics. 

Let d: S x S > [0,1] be a metric over S. We can lift it to be a metric 
over D(S) by using the Kantorowich metric [34] K(d): D(S) x D(S) > [0,1] 
defined via a linear programming problem as follows: 


K(d)(A, 0) d(s, t) 1 
K(@)(B,0)= ing, Yo dest) a(t (1) 
for A,O € D(S). The dual of the above linear programming problem is the 


following 


max >/.cg(A(s) — O(s))zs, subject to O<2,<1 
Vs,t€S: t,-2,<d(s,t). 

(2) 

The duality theorem in linear programming guarantees that both problems 
have the same optimal value. 

Let d: D(S) x D(S) > [0,1] be a metric over D(S). We can lift it to be 

a metric over the powerset of D(S), written P(D(S)), in the standard way 

by using the Hausdorff metric H(d): P(D(S)) x P(D(s)) > [0,1] given as 


°A weight function is also known as a coupling in some literature [46]. 
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follows 


A (d) (1h, Tz) = a ant uA, 9), au gat aO,A)t 


for all II,, Iz C D(S), whereby inf @ = 1 and sup@ = 0. 

Probabilistic labelled transition systems (pLTSs) generalise labelled 
transition systems by allowing for probabilistic choices in the transitions. 
They are essentially simple probabilistic automata [42] without initial states. 


Definition 2.1 A probabilistic labelled transition system is a triple (S,A,—), 
where S is a countable set of states, A is a countable set of actions, and the 
relation + CS x Ax D(S) is a transition relation. 


We write s 4 A for (s,a, A) € > and s # if there is no A satisfying s “> A. 
Let der(s,a) = {A | s “+ A} be the set of all a-successor distributions of s. 
A pLTS is image-finite (resp. deterministic or reactive) if for any state s 
and action a the set der(s,a) is finite (resp. has at most one element). In 
the current work, we focus on image-finite pLTSs with finitely many states. 


3 State-Based Bisimulation Metrics 


We consider the complete lattice ([0,1}°**,) defined by dC d! iff d(s,t) < 
d'(s,t), for all s,t € S. For any D C [0,1]°** the least upper bound is given 
by (L] D)(s,t) = supgep d(s,t), and the greatest lower bound is given by 
([]D)(s,t) = infgep d(s,t) for all s,t € S. The bottom element 0 is the 
constant zero function O(s,t) = 0 and the top element 1 is the constant one 
function 1(s,t) = 1 for all s,t€ S. 


Definition 3.1 A 1-bounded metric d on S' is a state-based bisimulation 
metric if for all s,t € S with d(s,t) <1, whenever s “+ A then there exists 
some t “+ A’ with K(d)(A, A’) < d(s,t). 


The smallest (wrt. £) state-based bisimulation metric, denoted by ds, is 
called state-based bisimilarity metric. Its kernel is the state-based bisimilarity 
as defined in [37, 42]. Note that 0 does not satisfy Definition 3.1 for general 
pLTSs, thus is not a state-based bisimulation metric in general. 


Example 3.1 Let us calculate the distance between states s andt in Figure 1. 
Firstly, it is clear that d;(s4,ts5) = 0 because both s4 and ts are deadlock 
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Figure 1: d,(s,t) = 3 


states. It follows that ds(s2,t3) = 0 because sz has a unique c-transition to 54 
and tz has a unique c-transition to ts. On the contrary, ds(s3,t3) = 1 because 
the two states s3 and t3 perform completely different actions. Secondly, let 
A= $59 + $53 and @ =t3. We see that 


K(d,)(A, 0) = MiN,,<Q(A,0) d,(s2, tz) : w(s9, tz) ae d,(s3, tz) # w(s3, t3) 
= minyen(A,e) 0: w(s2,t3) + 1- w(s3, ts) 
— (0) . 5 + 1 <i 5 
=A 


2 
Here the only weight function is w with w(s2,t3) = w(s3,t3) = 5. It follows 
that ds(si,t1) = 5. Similarly, we get d;(s1,t2) = $. Then it is not difficult 
to see that 


Nile 


Pee 1 
K (ds) (51, 31 + 32) = d,(s1, t1) : 3 + d,(s1, t2) : 
from which we finally obtain ds(s,t) = 5: 


The above coinductively defined bisimilarity metric can be reformulated 
as a fixed point of a monotone functional operator. Let us define the 
functional operator Fs: [0,1]°*%° — [0,1]9*° for d: S x S — [0,1] and 
s,te S by 


FAs 1) = UE) ders a), der(t,a))}. (3) 
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It can be shown that F’, is monotone and its least fixed point is given by 
| |d;, where dy = O and dj41 = F(d;) for alli EN. 


Proposition 3.1 d, is the least fixed point of Fs. 


Essentially the same property as Proposition 3.1 has appeared in [18]. 

Now we proceed by defining a real-valued modal logic based on Hennessy- 
Milner logic [30], called metric HML, to characterise the bisimilarity metric. 
It is motivated by [33, 17, 18, 5]. 


Definition 3.2 Our metric HML is two-sorted and has the following syntax: 


gu= T|-~|eSp| pirge | (ay 
wr= [pl | avy | Pop|dyiAde 


witha € A and p € [0,1]. 


Let £ denote the set of all metric HML formulae, y range over the set of 
all state formulae £L°, and w range over the set of all distribution formulae 
L. The two kinds of formulae are defined simultaneously. The operator 
y © p tests if a state passes y with probability at least p. Each state 
formula y immediately induces a distribution formula [y]. Sometimes we 
abbreviate (a)[y] as (a)y. Other operators such as negation, conjunction, 
and the diamond operator come from the classical HML, but will be given a 
quantitative interpretation. 


Definition 3.3 A state formula y € L° evaluates in s € S as follows: 


ITs) = 1 
[-¢l(s) = 1-[¢I(s) 
[vopl(s) = max([¢](s) —p, 0) 

[eA yells) = min([yr](s), [yal (s)) 
[(a)¥](s) = max «, [u)(A) 


with max) = 0 and a distribution formula w € LP evaluates in A € D(S) 


as follows: 
[Iv] = Vees Als) - [ye] (s) 
[-y] = 1-[y](A) 
[bv op] max([%](A) — p, 0) 
[v1 A Pal] min([y](A), [2] (A)). 


a a 


A) 
A) 
A) 
A) 
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We often use constant formulae e.g. p for any p € [0,1] with the 
semantics [p](s) = p, which is derivable in the above logic by letting p = 
T © (1—>p). Moreover, we write y © p for =((-y) © p) with the semantics 
[vy ® p](s) = min([y](s)+p, 1) = 1—max(1—[y](s) —p,0). In the presence 
of negation and conjunction we can derive disjunction by letting y1 V ya 
be 7(-91 A 7¥2). Intuitively, [y](s) measures the degree that formula 
y is satisfied by state s; similarly for distribution formulae. Therefore, 
negation is naturally interpreted as complement, conjunction as minimum 
and disjunction as maximum®. The formula (a)w specifies the property for 
a state to perform action a and result in a possible distribution to satisfy 
w. In the presence of nondeterminism, from state s there may be several 
outgoing transitions labelled by the same action a, e.g. s —> A; with i € I. 
We take the optimal case by taking [(a)w](s) to be the maximal [~w](A;) 
when 7 ranges over I. 

The above metric HML induces two natural logical metrics dls and ald 
on states and distributions respectively, by letting 


d&(s, t) 
di‘(A, ©) 


supyecs Il¢](s) — lel | 
supyec? I[Y](A) — [4] ()I. 


Remark 3.1 In the above definition, we can also write 


d5(s,t) = sup ([y](s) — [el ()) (4) 
pels 


because if [y](s) < [vy] (4) then we can take the negation of y so as to obtain 


IIe] (s) — Le] @I- 
[-¢](s) - Fe] = 0 — [el (s)) - 0 - [el @)) = IIe (s) — [el] - 
However, this heavily relies on our semantic interpretation of the negation 


operator, and we decide not to use (4) as a definition. Similarly for d\{(A,@). 


Example 3.2 Consider the two probabilistic systems depicted in Figure 2. 
We have the formula y = (a)y where w = [(a)T] A [(b) T] and would like to 
know the difference between s and t given by wy. Let 


A3 = 0.5-534+0.5- 5] 


°Since we will compare our logic with that in [18], it is better for our semantic 
interpretation to be consistent with that in the aforementioned work. In the literature, 
there are also other ways of interpreting conjunction and disjunction in probabilistic 
settings, see e.g. [32, 4]. 
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Figure 2: d'(s,t) = 0.3 


Note that [(a)T](s1) = 1 and (a) T](s2) =0. Then 
[[(a) T]] (Ai) = 0.2 - [(a) T](s1) + 0.8 - [(a) T](s2) = 0.2. 
Similarly, [[(b) T]](A1) = 0.8. It follows that 
[o](A1) = min([[(a) T]](A1), [[(o) T]](A1)) = 0.2. 


With similar arguments, we see that [~](A2) = 0.2 and [v](A3) = 0.5. 
Therefore, we can calculate that 


[e](s) = max([%](A1), [y](A2)) = 0.2 
[e](t) = max([P](A1), [Y] (Az), [](As)) = 0.5. 


So the difference between s and t with respect to y is |]y](s) — [y](t)| = 0.3. 
In fact we also have d'§(s,t) = 0.3. 


In the presence of testing operators in state formulae, one might wonder 
if the testing operators in distribution formulae can be removed. Unfortu- 
nately, this is not the case, as indicated by the following example. 
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Example 3.3 At first sight the following two equations seem to be sound. 
[lv] opl(A)=[lyorl(A) and [JQ Ai) = do Pi(lyl(A 


However, in general they do not hold, as witnessed by the counterexamples 
below. Let p = (b)T, YW = [yp] © 0.5 and the distribution A, be the same as 
in Example 3.2. Then we have 


[[y] 2 0.5](A1) = max([[p]](41) — 0.5, 0) 
= max(0.2[[(6)T Gr) + 07 J] (32) — 0.5, 0) 
= max(0.2-0+0.8-1—0.5, 0) 
= 03 


[lp S0.5]}(A1) = 0.2[¢ 6 0.5](s1) + 0.8] y © 0.5])(s2) 
= 0.2max([y](s1) — 0.5, 0) 
+ 0.8max([y](s2) — 0.5, 0) 
= 0.2max(0— 0.5, 0) + 0.8 max(1 — 0.5, 0) 
= 04 


0.2[b] (sr) + 0.8[¥] (2) = 0.2[[y] 0 0.5 
= 0.2max({[[y]I 
+ 0.8max([[y]] (S52) — 0.5, 0) 
= 0.2max(0— 0.5, 0) + 0.8max(1 — 0.5, 0) 
= 04 


———) 
“ar 
2 


51) + 0.8[ [9] © 0.5] (52) 


So we see that [[y] 0 0.5] (A1) 4 [|v © 0.5] (Ai) and Jy] (Ar) 4 0.2] ] (Sr) + 
0.8] p] (52). 


It turns out that the logic £ precisely captures the bisimilarity metric d,: 
the metric ds defined by state formulae coincides with d, and the metric d!4 
defined by distribution formulae coincides with K(d,), the lifted form of dg. 


Theorem 3.1 d, = d'8 and K(d;) = d@ 


The two properties in Theorem 3.1 are coupled and should be proved si- 
multaneously because state formulae and distribution formulae are defined 
reciprocally. The proof is carried out in three steps: 


(i) We show dS C d, and di € K(d,) simultaneously by structural 
induction on formulae. 
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(ii) We establish K (d's) C di by exploiting the dual form of the Kan- 
torovich metric in (2). Here it is crucial to require the state space 
of the pLTS under consideration to be finite in order to use binary 
conjunctions rather than infinitary conjunctions. The negation and 
testing operators in state formulae play an important role in the proof. 


(iii) We verify that d’S is a state-based bisimulation metric and so obtain 
d, C d8. This part is based on (ii) and requires the pLTS to be 
image-finite. Its proof makes use of the negation and testing operators 
in distribution formulae. 


We follow the above guideline and decompose Theorem 3.1 into three 
technical lemmas. 


Lemma 3.1 1. dS Cd, 


2. d¢ Cc K(d,) 


Proof: We show the two statements simultaneously by structural induction 
on formulae. For any two states s,t € S and distributions A;, Ag € D(S), 
we prove that 


(i) I[¢l(s) — [y]@| < ds(s,#) for all y € £°; 
(ii) |[w](A1) — [Ye] (A2)| < K(ds)(A1, Az) for all w € LP. 
We first analyze the structure of in (i). 


e y=T. Then it is trivial to see that |[y](s) — [y]()| =|1 -1] =0< 
d,(s, t). 


e y=’. Then |[¢](s) — [¢] | = lle] @ — [¢'(s)| < ds(s,t) where 
the inequality holds by induction. 


eo = Op. There are four subcases and we consider one of them. 


Suppose [y'](s) > p and [y'I(t) < p, then |[-](s)— [el (| = IIv/I(s) - 
Dl <[fv'I(s) - Ie 1] < de(s, t) by induction. 


e P= y1A%2. Without loss of generality we assume that [y]](s) > [v] (t). 
There are two possibilities: 


— If [yi] () < [¢2]@), then [y](s) — [¥]@ < Tvil(s) - [eu] ® < 
d,(s,t), where the last inequality holds by induction. 
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~ Symmetrically, if [p2](@) < [¢i]@, then [¢l](s) — le]l@ < 
[vel (s) — [val (t) < ds(s, ¢). 


e y= (a). If either s or t cannot perform action a, the expected result 


is straightforward. So we consider the non-trivial case that both s and t 
can perform action a. Let A; be a distribution such that s ; A; and 
[(a)~](s) = [vy] (Ax). Since d, is a state-based bisimulation metric, by 
definition there exists some A» such that t “> A» and 


K(d;)(Ay, Az) < ds(s,t) . (5) 


Without loss of generality we assume that [y](s) > [y](t). It follows 


that 
[vl] (s) — [vy] @) 
= [w(Ar) — max, «, f(A) 
< [wv] (Ar) - Pe) 
< K(ds;)(Aj, As) by induction on w 
< d,(s,t) — by (5) 


Then we analyze the structure of w in (ii). 


e ~ =|] for some y € L°. Without loss of generality we assume that 


[e] (Ai) = [wv] (Ac). We infer that 


[](A1) — [¥] (Aa) 

[lel] (Ar) — Ly] (Ae) 

ues(Ai(u) — Ao(u))[e] (u) 

max{)> 7 ,<9(A1(u)—Ao(u))tu| ru, tw €[0, IJAry—ty <ds(u, u’)} 
K(ds)(A1, Aa) 


I IA Il 


where the last equality holds because of the Kantorovich-Rubinstein 
duality theorem [34, 48] and the last inequality holds because for any 


states u,u’ € S we have [yp] (u), [y](w’) € [0, 1] and |[y] (wu) —[y](v’)| < 
d,(u,u’) by induction. 


e w= 1A. Without loss of generality we assume that [w](A1) > 


[~](A2). There are two possibilities: 


— If [¢i](A2) < [ye] (Az), then [](A1) — [](A2) < [vi (A1) — 
[v1] (Ao) < K(d;)(Ai, Ae), where the last inequality holds by 


induction. 
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— Symmetrically, if [¢2](A2) < [Yi] (Az), then [y](A1)—[Y] (Az) < 
[2] (Ar) — [v2] (Az) < K(ds)(Aj, Ag). 


e w=-W’ or y’ Sp. Similar to the proof by induction of the last case. 


Lemma 3.2 K(d's) C di¢ 


Proof: Let Aj, As be any two distributions in D(S). We aim to show 
that 


K(d§)(A1,A2) < sup, HEPA) — Bo). (6) 


Let L(A,, Ag) be the optimal value of the following linear program 
max >) .¢g(A1(s) — Ao(s))rs, 


subject to O<a, <1 (7) 
Vs,tES: 25-2, < d§(s,t) 


Let {ks}ses be a set of real numbers in the interval [0, 1] that maximize 
the above linear program to reach L(A;, Az). We first consider the special 
case that k,; = 1 for all s € S. Then the maximum value of the linear 
program in (7) is 


S “(Ai(s) — Ao(s))-1= $5 Ai(s) — $7 Aa(s) =1-1=0. 


ses ses ses 


It follows that K(d!S)(A;, Az) = 0 and this immediately implies (6). 
Now consider the general case that k; < 1 for at least one s € S. We 
are going to show (6) by using an idea inspired by [18]. Let 


e=min{l—k|k<1landteS} 


and € > 0 be any positive real number smaller than e. Hence, if t € S and 
ky < 1 then 
ky +e<l. (8) 


We construct some formula 7 such that 


L(Ai, As) —€ < [¢](A1) — [vy] (A2). (9) 


For any s,t € S, we distinguish two cases: 
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1. Ifk, > ki, then 0 < ke -—-ki < ds (s,t). It is easy to see that there 


exists some formula vy, such that 


ks — ky < [se] (s) — [¥s](t) + €. (10) 
or equivalently [vse] (t) — [vst] (s) + ks < ke + €. We define a new 
formula 


eo { pst © ([Pse](s) — ks) if [pst] (s) > ks 
si Pst © (ks — [se] (s)) otherwise. 


Let us compare [y/,,](t) with ky. 
(a) If [yst](s) > ks, then 


[sel (¢) max([Ys](t) — [yse]](s) + Ks, 0) 
max(k;, + e, 0) by (10) 


ky + € 


(b) Otherwise, we have [y5,](t) = min([si] (4) + Ks — [Psell(s), 1) by 
definition. By (10) we infer the inequality that [ys:](t) + ks — 
[vse] (s) < ke +. It follows that [yi ](t) < ke +e. 


Il A ll 


In both (a) and (b) we have [v4,]|(¢) < ky +, and it is also easy to see 
that [yi] (s) = ks. 


2. If ks < ky, then we simply set vy’, to be the formula ks. As in the last 


case, we have [y/,](s) =k; and [yi] (t) =ks < ke < ke +e. 


In summary, the above reasoning says that for any s,t € S we can 


construct a formula vy’, such that [y,](s) = ks and [4] (t) < ke +e¢. Now 
let us define y, = Ajeg Y%z- It is easy to see that [y4](s) = ks and [y4](t) < 
ky +e for all t € S. The latter implies max{[y%](t) | s,t © S} <k, +e. Then 
define y = <5 4%. For all t € S, we have 


ke = [e)@ < [el@ = max{[y]@|s,t€ 5} < e+e 


Finally, we define w = [y]. It follows that 


[vy] (Ar) — [Y](A2) = [lel] (Ar) — [yl] (Ae) 

Mites Ai(t) - [el @) — reg Aa(t) - [¢]@) 
dites Ar(t) ki — dites Aa(t) - [¥]@) 
Dtes Ai(t) - kt — Direg Aalt) - (he + €) 
wees (Ai(t) — As(t)) - kt — Deg Aa(t) -€ 
L(Ay, Ao) Tie 


I VIV Il 
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as required in (9). 
The above property will be used to prove the following lemma. 


Lemma 3.3 d, EC d's 


Proof: We show that d' is a state-based bisimulation metric. Let 
s,t be any two states in S and ¢ be any real number in the interval [0, 1) 
with d'§(s,t) <. Assume that s “) A; is an arbitrarily chosen transition 
from s. Then state ¢ must be able to perform action a too. Otherwise 
it is easy to see that d'§(s, t) = 1 >, which contradicts our assumption 
above. We need to show that there exists some transition t “> A» with 
K (d'5)(A,, Ay) < €. Suppose for a contradiction that no a-transition from t 
satisfies this condition. In other words, for each Aj with t “+ A} we have 
K (d'5)(A,, AS) > €. By Lemma 3.2, this means d\¢(A;, A$) > €. Then 
there must exist some formula ¥3 € LP such that |[/5](A1) — [v5] (A3)| > €. 
Furthermore, we can strengthen this condition to the following one 


[vo] (Ax) — [5](A5) > « (11) 
because we can take the formula 7 in place of v3 in the case that 


[v5](A1) < [v3] (Aa). Let 
vp = (a) AS © [3](A})) - 


a 


We infer that 


[el(s) = max, o, A:¥5 6 Wil(AS)1(A) 


> [Ais © [9] (A5))1(A1) 

= min,[ © [Y5](A>)](A1) 

= [¥$ 0 [WS](A5))(A1) for some k 
= max([#$](A1) — [v9] (44), 0) 

> € by (11) 


On the other hand, we have 


[yl(t) = MAX, 2, ai IA, ° [va](A2))I(A4) 
= max, «, , miny [ve [y3}(A2 (A) | 
= max, a, ,, ming max(([/13](A5) — [43] (A3)), 9) 
0 
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It follows that d'8(s,t) > [y](s) — [v](t) > ¢, which gives rise to a contra- 
diction. 

Finally, we obtain a proof of Theorem 3.1. 
Proof: By combining the last three technical lemmas. 


Remark 3.2 In the proof of Lemma 8.8 we have constructed the formula 


p= (a) \(d5 © [2](A4)) (12) 


a 


by making use of conjunction and minus connectives for distribution formu- 
lae. This happens because in the presence of non-determinism state t may 
perform action a and then evolves into one of several successor distributions 
Ai. If we confine ourselves to deterministic pLTSs, then state t will have 
a unique successor distribution A‘ and therefore (12) can be simplified as 
yg = (a)v}. In this case, there is no need of conjunction and minus con- 
nectives for distribution formulae. That is, distribution formulae are in 
the form |y] or -|y]. Furthermore, if we fold them into state formulae in 
Definition 3.2, distribution formulae can be completely dropped. In other 
words, for deterministic pLTSs, the state-based bisimilarity metric can be 
characterised by the following one-sorted metric logic 


g:=T |v |pop| giAge | (ay. (13) 


Therefore, for deterministic pLTSs, the two-sorted logic in Definition 8.2 
degenerates into the logic considered in [17, 50, 26], as expected. In the 
one-sorted logic, the formula (a)(y © p) will be interpreted the same as the 
formula (a)[y © p| in LS, but no formula has the same interpretation as 
(a)([y] © p) in LS; the subtlety has already been discussed in Example 3.3. 
In [8, 8] a bisimulation metric for game structures is characterised by a 
quantitative t-calculus where formulae are evaluated also on states and no 
distribution formula is needed. This is not surprising because the considered 
2-player games are deterministic: at any state s, if two players have chosen 
their moves, say a, and ag, then there is a unique distribution 6(s,a1, a2) to 
determine the probabilities of arriving at a set of destination states. 


4 Distribution-Based Bisimulation Metric 


The bisimilarity metric given in Definition 3.1 measures the distance between 
two states. Alternatively, it is possible to directly define a metric that 
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measures subdistributions. In order to do so, we first define a transition 
relation between subdistributions. 


Definition 4.1 With a slight abuse of notation, we also use the notation 
*, to stand for the transition relation between subdistributions, which is the 
smallest relation satisfying the three rules given in Figure 8. 


Eat 8 


a 


earey a ae 


wl 


Viel. p, >OAA; > 0; T is finite Sop <1 
wel 


(So pi -Ai) > Or - Oj) 


wel tel 


Figure 3: Rules for transitions between subdistributions 


Note that if A > A’ then not necessarily all the states in the support 
of A can perform action a. For example, consider the two states sg and s3 in 
Figure 1. Since sy “> 34 and s3 cannot perform action c, the distribution A = 
559+ 553 can make the transition A > 554 to reach the subdistribution 554. 
Lemma 4.1 For any subdistribution A € Dgyp(S) and action a, if A “> A’ 
then there exisits some subdistributions A, such that 


1. A! = DsefA] A(s) * As; 
2.34 A, for each s € [A]; 
3. ifs > then A, =€. 


Proof: By induction on the rules of inferring A “> A’. As displayed 
in Figure 3, there are three rules. The first two are straightforward, so we 
assume that A > A’ is derived from the last one. Suppose A = er Pr On 
A! = Vier Pi Ay Mer Pi < 1 and for all i € I we have A; “+ Al. 
By induction hypothesis, for each i € J, the subdistribution A‘ can be 
decomposed as Aj = 0... [As] Aj(s) - Ajs with 3 “5 A;, for each s € [Aj], 
and Aj, = if s #». Note that A(s) = 7;<;pi- Ai(s). It follows that 


5- yr PAs) 5 2 PAs) 
t= AG) 2 ne Me 


wel wel 
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Now let A, = ei pies) - Ais. The above transition can be written as 


53 > A,. We also observe that 


A’ = Sop: ‘> Ai(s)- Ais = ~ S > piAi(s)- Ais = ye A(s)- Ag 


iel sE[A;] sE[A] tel sE[A] 


Moreover, if s ot then A;, = € and thus A, = € as required. 


Definition 4.2 A 1-bounded pseudometric d on Dsyp(S) is a distribution- 
based bisimulation metric if, for all Ay, Ao € Dgup(S), the following two 
conditions are satisfied: 


1. ||Ai| — [Aa] | < d(Aj, Aa); 


2. whenever d(A,, Az) <1 and A; “> A‘, then there is some transition 
Ay > Ab such that d(Aj, A) < d(Ay, Ao). 


The condition | |A;| — |As|| < d(A;, Ag) says that the distance between 
two subdistributions should be at least the difference between their total 
masses. The smallest (wrt. £) distribution-based bisimulation metric, nota- 
tion dg, is called distribution-based bisimilarity metric. Distribution-based 
bisimilarity [14] is the kernel of the distribution-based bisimilarity metric. 
Let der(A,a) = {A’| A + A’}. We define the functional operator 


Fa: (0, 1]Psub($)xDsub(S) ay (0, 1|Psub($)xDsub(S) 
for d: Dsup(S) X Dsuv(S) > [0,1] and A, © € Dgup(S) by 


Fa(d)(A, ®) smut au) eer Asd) der (eya))); [|Al— JO] |). (14) 


It can be shown that F'g is monotone and its least fixed point is given by |_| d;, 
where do(A, 9) = | |A| — |O| | for any A, O € Dyyp(S) and dj41 = Fq(d;) for 
all i € N. The property below is analogous to Proposition 3.1. 


Proposition 4.1 dg is the least fixed point of Fa. 


It is not difficult to see that d, is different from dg, as witnessed by the 
following example. A more accurate comparison is given in Section 5. 
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Example 4.1 Consider the states in Figure 1. We first observe that 
d4(52,t3) = 0 because sg and t3 can match each other’s action exactly. 
Similarly, we have dq(33,t4) = 0. Then it is straightforward to see that 


da (552 533, 5t3+4t4) = 0. Since 87 2 552+553 and st | aD ms st3+5ta, 
we infer that dq(31, 5st + 5t2) =0. This, in turn, implies dg(5,t) =0. We 
have already seen in Example 8.1 that ds(s,t) = 5. Therefore, the two 
distance functions d, and dq are indeed different. 


We now turn to the logical characterisation of dg. Consider the metric 
logic £?* whose formulae are defined below: 


wr=T lad |vorp| dA. | (ay. (15) 


This logic is the same as that defined in (13) except that now we only have 
distribution formulae. The semantic interpretation of formulae comes with 
no surprise. 


Definition 4.3 A formula y € LP* evaluates in A € Dgyp(S) as follows: 


[TM(A) = 4 
[-v(A) = 1-[v]@) 

[yo pI(A) = max([¥](A) —p, 0) 

[Yr Adve\(A) = min([erJ(A), f2](4)) 
[(a)¥](A) = max, a, [vf(A’. 


This induces a natural logical metric d'!¢ over subdistributions defined by 


dii(A,9) = sup |[¥](A) — [4] ()| 
weL?* 


It turns out that d'@ coincides with dg. Below we show that one metric is 
dominated by the other and vice versa. 


Lemma 4.2 dC dy 


Proof: Similar to the proof of Lemma 3.1. We proceed by structural 
induction on formulae. For any two subdistributions A;, Ay € D(S), we 
prove that 

Po] (Ar) — [v] (A2)| < da(Ai, Ae) 


for all » € £*. 
Let us analyze the structure of w. 
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e w=T. Then it is trivial to see that |[~](A1) — [vy] (Ae2)| = | |Ai| -— 
|A2|| < da(A1, Ag). 


ey = 7". Then |[¥](A1) — [¥](A2)| = |[¥](A2) — f](A1)| < 
da(Ai, Az) where the inequality holds by induction. 


e wy =wW’' Op. There are four subcases and we consider one of them. 
Suppose [y’](A1) > p and ["](A2) < p, then |[¥](A1) — [Y](A2)| = 
[PY (Ar) — pl < |IYT(A1) — [w’](A2)| < da(A1, Ae) by induction. 


e JW =A V2. Without loss of generality we assume that [](Ai) > 
[~](A2). There are two possibilities: 


— If [¢i](A2) < [v2] (Az), then [o](A1) — [W](A2) < [viJ(A1) — 
[v1] (Az) < da(Ai, Ag), where the last inequality holds by induc- 
tion. 


— Symmetrically, if [¢2](A2) < [Yi] (Az), then [y](A1)—[¥] (Az) < 
[e2](A1) — [Y2] (Az) < da(A1, Ag). 


ew = (a)y’. Let A‘ be a distribution such that A, “> A‘ and 
(a)y"](A1) = [vy] (44). Since dy is a distribution-based bisimulation 
metric, by definition there exists some A‘ such that Ay Aj and 
da(A{,, AS) < da(A1, Az). Without loss of generality we assume that 
[v](A1) = Iw] (Ag). It follows that 


[I(A1) — [eh] (A2) 
= [W(A)) - max, 2, AY'(AS) 


< [w](A1) — [Y'](A9) 
< dg(Aj,4$) by induction on yw’ 
< da(Ai, A») 


Lemma 4.3 d, C dit 


Proof: The proof is similar to that of Lemma 3.3, so we omit it. 


By combining the previous two lemmas, we obtain the logical charac- 
terisation of dq. 


Theorem 4.1 d,; = d'¢ 


Behavioural Pseudometrics for 
Nondeterministic Probabilistic Systems 231 


5 Comparison of the Bisimilarity Metrics 


In this section, we compare the state-based bisimilarity metric d, with the 
distribution-based bisimilarity metric dg. More precisely, we show that dq is 
a lower bound of K (d,) when measuring full distributions’. The proof makes 
use of fully enabled pLTSs as a stepping stone. Let us first fix an overall set 
of actions Act and a special action | ¢ Act. Let EA(s) = {a | JA. s “> A} 
be the set of actions that are enabled at state s. We also use | to stand for 
a special state when there is no confusion. 


Definition 5.1 A pLTS with state set S is fully enabled if for any state 
s€S\{1} we have EA(s) = Act. Given any pLTS A = (S,A,—) with A C 
Act, we can convert it into a fully enabled pLTS At = (S, ActU {1}, 1) 
as follows: 


{s'|s € S}U{1L} 

{(st,a, A+) | (s,a, A) €>} 
U{(st,a,l)|s 4 anda € Act} 
U{(L,a, 1) | ae Act U {L}}. 


S 
—> 


where At(st) = A(s) for each s € S and A+(1) =1—|A\. In other words, 
each state s in A corresponds to a state st in A+ such that s+ keeps all the 
transitions of s and can evolve into the absorbing state | by performing any 
action in Act not enabled by s. As a consequence, each subdistribution A 
on the states of A has a corresponding full distribution A+ on the states 
of At. 

For any pLTS, let s,t be two states and A,© two subdistributions. 
It can be shown that d,(s,t) = d,(st,t+) and dq(A,®) = dg(At, 0+). 
Moreover, for fully enabled pLTSs, the metric dg turns out to be a lower 
bound of K(d,) as far as distributions are concerned. Before proving those 
properties, we first present the following technical lemma. 


Lemma 5.1 For any subdistribution A on A and any a € Act, 
AS ag kA 


Proof: (=) Suppose A is a subdistribution on A and A + A, for some 
a € Act and subdistribution A;. By Lemma 4.1 we can decompose A, such 


"Although da can measure the distance between two subdistributions, the Kantorovich 
lifting of d, can only measure the distance between full distributions or subdistributions 
of equal mass, which can easily be normalized to full distributions. 
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that A, = yos€[A] A(s)- As, 3-4 Ag for each s € S’, where S’ is the set of 
states in the support of A that can enable action a, and A, = €¢ if s ZS’. 
For each s € S’, the state s+ keeps all the transitions of s, so we have some 
distribution AL with st “+ A‘. For each s ¢ S’, we have st “> 1. It 
follows that 


+ (S>A(s)- Ad + YO A(s)-L+(1-|A))- 1) =z. 
sES! sE[A]\S" 


(<) Suppose At -*> A} for some aaa oatae es A on A and some 
action a € Act. We have that At = Disefay A(s): s> + (1—[A))- To" By 
Lemma 4.1 we have that At = (Usepay Als) - Os) + A — AN) - , where 


@, = A} if s enables a and 3 “; A, for some distribution A,, or 0, = L 
if s cannot enable action a. Let 5S’ be the set of states in the support of A 
that can enable action a. We have that 


= (>) A(s)- AZ) +0- $5 AG))-L = OS Als) 


ses! ses! ses! 


By setting Ay = >) ,¢4 A(s)- As, we indeed have that A *y Ay. 


Lemma 5.2 1. Let s,t be any two states of A. Then ds(s,t) = ds(st,t+) 
2. Let A, be two distributions on A. Then K(ds)(A, @)=K(d,)(At+, 0+). 


3. Let A,© be two subdistributions on A. Then da(A,®) = dg(At, +). 


Proof: 


1. By Proposition 3.1 we see that ds = |_|di, where dp = 0 and dj41 = 
F,(d;) for alli € N. We show by induction on i that d;(s,t) = d;(st, t+) 
for alli ¢ N. The base case is trivial. Let us consider the inductive 
step. 


disi(s,t) = suPpgescet H (K (di))(der(s, a), der(t, a))} 
dizi(s,t-) = supgeactuti}{H (K (di) (der(s~, a), der(t”, a))} 


If X = {Aj,..., An} is a set of distributions on A, we denote by (X)+ 
the set {At,..., A+}. For any a € Act, we distinguish four cases: 
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(a) Both s and t can enable action a in A. That is, neither der(s, a) 
nor der(t,a) is empty. Then s “> A iff st “+ A+. That is, 
each a-successor distribution of s, say A, has a corresponding 
a-successor distribution of st, say A+, and vice-versa. Similarly, 
for each © € der(t,a), we have @+ € der(t+,a). This means 
that der(s+,a) = (der(s,a))+ and der(t+,a) = (der(t,a))+. By 
induction, we have that d;(u,v) = dj(ut,v+) for any u,v € S. It 
follows that 

K (di)(A, ©) = K(d;)(A*, O*) 


and moreover, 
H(K(d;))(der(s,a), der(t,a)) = H(K(d;))(der(st,a), der(t+,a)) . 


(b) State s cannot enable action a but state t can enable action a. 
Then der(s,a) = 0, der(st,a) = {1}, der(t,a) 4 0 and L ¢ 
der(t+,a). Clearly, K(d;)(1,9+) = 1 for any 6+ € der(t+,a) 
because here © is a full distribution and t/ 4 for any t’ € [O]. 
It follows that 


H(K(d;))(der(s,a), der(t,a)) = 1 = H(K(d;))(der(s+, a), der(t+,a)). 


(c) The symmetric case of (b) by exchanging the rules of s and t. We 
also have 


H(K(d;))(der(s,a), der(t,a)) = 1 = H(K(d;))(der(s+, a), der(t+,a)). 


(d) Neither s nor ¢ can enable action a. Then der(s,a) = der(t,a) =0 
and der(st,a) = der(t+,a) = {1}. It follows that 


H(K (d;))(der(s,a), der(t,a)) = 0 = H(K(d;))(der(st,a), der(t+,a)) . 
In all the cases above, we always have the following equation 


H(K(d;))(der(s,a), der(t,a)) = H(K(d;))(der(s~,a), der(t”,a)) 
(16) 
for any a € Act. For the action |, we have der(st, )=der(t+, 1)=0. 
Hence, 


H(K(d;))(der(s+, 1), der(t+,1)) =0. (17) 
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By using (16) and (17), we can reason that 
disi(s,t) = supgescr{H(K (di))(der(s, a), der(t,a))} 
= supgeaa{H(K (di))(der(s~, a), der(t~, a)) 
= suPacactu}{H (K (di))(der(s*, a), der(t~, a)) 
=r dial eat}: 


2. Since A and @ are full distributions, then so are A+ and 0+. Then 
Clause 2 follows from Clause 1 immediately. 


3. The proof is similar to that of Clause 1. By Proposition 4.1 we see 
that dg = ||di, where dp = O and d;4, = Fa(d;) for all i e¢ N. For 
distributions on At, we need to consider the special action too. We 
show by induction on i that d;(A,@) = d;(A+,@+) for all ic N. The 
base case is trivial. Let us consider the inductive step. 


disi(A,@) = moat SUD et (de) her 25a), der(®,a))}, ||A] —|®l]) - 


Since At and 0+ are full distributions, there is no need of comparing 
their masses. Hence, 


dij41(A+, 0+) = sup { H(d;)(der(At+, a), der(Q+,a))} : 
ace ActU{L} 


By Lemma 5.1, for any A on A and a € A, we have the correspondence 
of transitions A “> A, iff At > At. Similarly, for each 0; € 
der(@,a), we have O+ € der(@+,a), and vice-versa. This means 
that der(At,a) = (der(A,a))+ and der(@+,a) = (der(@,a))+. By 
induction, d;(Ay, 01) — d;(Az, Ot) for any Ay, 91 E Dap): It 
follows that 


H(d;)(der(A,a), der(Q,a)) = H(d;)(der(At,a), der(Q+,a)) . 


Therefore, 


sup{H(di)(der(A,a), der(0,a))} = sup{H(di)(der(A+,a), der(O,a))} 

acA acA 

Observe that in At no state except for L can enable action L, which 

means that the following equality holds: der(A+, 1) = {(1—|A])- I}. 

Similarly, der(Q@+, 1) = {(1 — |O])- L}. We then have that 
H(d;)(der(A~, L), der(O*,L))} Gil [Aly bs (= [Gly el) 

| (1 — |A]) — (2 — J9})| 

| |A] — [9] | 
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Now it is easy to see that dj,1(A, ©) = dij41(A+, 6+) as required. 


Theorem 5.1 Let A, © be two distributions on a fully enabled pLTS. Then 
da(A, 0) < K(d,)(A, 6). 


Proof: We will prove that K(d,) is a distribution-based bisimulation 
metric for fully enabled pLTSs. Since dg is the smallest distribution-based 
bisimulation metric, it follows that dg(A,0) < K(d,)(A, 9). 

By assumption, both A and @ are distributions. It is trivial to see that 


[|A|—|O|| = 0 < K(d;)(A, 9). 


Suppose K(d,)(A,@) < 1 and A 4 A’. Then for any s € [A], there 
exists some A, such that s “+ A, and A’ = Dise[A] A(s) - As because 
the pLTS under consideration is fully enabled. Let S be the state set of 
the pLTS excluding the special state L. For any t € S, we observe that 
d;(s,t) < 1 because in a fully enabled pLTS no two states different from L 
have distance 1. So by the definition of d,, there exists some ©; such that 
t “+ ©, and K(d,)(As, Or) < ds(s,t). We define 0’ = >,-¢ O(t) - QO, and 
it is easy to see that 0 “> 0’. 
Let w € O(A, 90) be a weight function satisfying 


K(ds)(A,0) = S> w(s,t) - de(s, 1). 
s,teS 
Similarly, let ws4 € Q(As, Oz) be a weight function satisfying 
K(ds)(As, O¢) = Ss Wst(u,v) > As(u, v). 


UujvEes 


Define w’ € D(S x S) as follows: 


w' (u,v) = S> w(s,t) - wst(u, v) 


s,teS 


for any u,v € S. We check that w’ is a weight function for A’ and OQ’. 


Dues Y (U0) = Dunes is,tes w(s,t) + wse(u, v) 
a yis,tes w(s, t) aes Wst(U, v) 
= dis,tes W(S; t) - Ox(v) 

Des O(t) - Ox(v) 

O'(v) 
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for any v € S. Similarly, we can infer that )>,-gw’(u,v) = A’(u) for any 
u € S. Therefore, we have w’ € 2(A’,O’), from which we can do the 
following reasoning: 


K(ds)(A',0.) < Vives (u,v) + ds(u, v) 
= u,veS yes W(s,t) -wWsr(u,v)-ds(u, v) 
= tenes w(s, t) Pes Ws.e(U, v) : d,(u, v) 
= Distes w(s,t) K(ds)(As, Ot) 
S Vstes (8, t) ds(s, t) 


K(d;)(A, 9). 


In summary, we have shown that AK (ds) is a distribution-based bisimulation 
metric. This completes the proof. 
Then we arrive at the following theorem. 


Theorem 5.2 Let A,O be two distributions on a pLTS. Then dg(A, 0) < 
K(d;)(A, @). 


Proof: Let A, © be two distributions on a pLTS A. Let At,©+ be the 
corresponding distributions on the fully enabled pLTS At. It follows from 
Lemma 5.2(2)-(3) and Theorem 5.1 that 


d4(A, 0) = dg(At, 0+) < K(d;)(At, 0+) = K(d;)(A, 9). 


6 Bisimulations 


The kernel of d, (resp. dy) is the state-based (resp. distribution-based) 
bisimilarity, denoted by ~s (resp. ~q). They can be defined in a more 
direct way. The definition of ~, requires us to lift a relation on states to 
be a relation on distributions. There are several different but equivalent 
formulations of the lifting operation, and they are closely related to the 
Kantorovich metric; see [11] for more details. The following one is taken 
from [16]. 


Definition 6.1 Given two sets S,T and a binary relation RC S x T, we 
define the lifted binary relation R'C Deyp(S) < Dsyp(T) as the smallest 
relation satisfying the following two rules: 


1. sRt implies 3 Ri t; 
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2. A; Ri ©, for all i € I implies (Serpe) RI (doer Ps* 83); where 
I is a finite index set and D),-, pi < 1. 


The state-based bisimilarity ~, is essentially Larsen and Skou’s proba- 
bilistic bisimilarity [37], which is originally defined for deterministic systems. 


Definition 6.2 Let ~,C S x S be the largest symmetric relation such that 
if s~,t and s “+ A then there exists some t ++ © with A (~,)' ©. 


The distribution-based bisimilarity ~g is proposed in [14] as a sound 
and complete coinductive proof technique for linear contextual equivalence, 
a natural extensional behavioural equivalence for functional programs. 


Definition 6.3 Let ~gC Deyp(S) xX Dsup(S) be the largest symmetric relation 
such that if A ~q © then |A| = |@| and A + A’ implies the existence of 
some ©! such that @ +> O! and A’ ~q ©’. 


Notice that, for any states s,t € S, the following three statements are 
equivalent: 


(i) 5~5 ft; 
(ii) d,(s, t) = 0; 
(iii) [y](s) = [y](¢) for any formula y € L°. 


Similarly, for any subdistributions A,O € Dsyp(S), the following three 
statements are equivalent: 


(i) Ana 8; 
(ii) da(A, ©) = 0; 
(iii) [¢](A) = [YJ (©) for any formula y € £L?*. 


Although the state-based bisimilarity is widely accepted, there is no general 
agreement on what is a good notion of distribution-based bisimilarity. In 
the literature [29, 15, 24, 20, 23, 31], several variations of distribution-based 
bisimulations have been proposed. Some of them are defined for pLTSs with 
states labelled by atomic propositions. We adapt them to our setting so as 
to compare them with ~g. 

In a pLTS (S,Z,—), a transition goes from a state to a distribution, 
e.g. s —> A. In order to lift > to be a relation between distributions, e.g. 
A + ©, usually we need to decide whether 
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(i) to require all the states in the support of A to perform action a; 
(ii) to combine transitions with the same label, which we explain below. 


In [24, 20, 23] both (i) and (ii) are imposed, while in [31] and also in our 
definition of ~q (i) is not used. The condition (ii) is built in Definition 4.1 
but partially used in [31], as we will see in the sequel. Let {s “+ Aj}jc7 be 
a collection of transitions, and {p;};e7 be a collection of probabilities with 
Wier Pi = 1. Then s “+0 (S0¢; pi: Ai) is called a combined transition [43]. 
Let us write A400 if s “+c A, for each s € [A] and 6 = disefay A(s):As- 


Remark 6.1 An equivalent way of defining combined transitions is to use 
Definition 4.1. We have that s 4c A iffs 4 A and |A| =1; A-4¢ 0 iff 
A * © and |A| = |O|. 


Note that a simple way of comparing subdistributions is to lift the 
state-based bisimilarity and use the relation (~,)'. That relation can be 
slightly weakened by using the combined transition t “+c © in place of 
t > © in Definition 6.2 to get a coarser notion of state-based bisimilarity 
called strong probabilistic bisimulation in [43], written ~4, and then lifting 
it to subdistributions to finally obtain (~4)'. This is essentially the relation 
investigated in [29]. However, most distribution-based bisimilarities proposed 
in the literature directly compare the transitions between (sub)distributions, 
so there is no need of defining certain relations on states and then lift them 
to subdistributions. Below we recall four typical proposals. 

Firstly, we adapt the bisimulation of [24] to our setting. Let (.S,A,—) 
be a pLTS, we extend it to be a fully enabled pLTS (S,, Act U{L},-_) 


according to Definition 5.1. 


Definition 6.4 Let ~;C D(S,) x D(S_) be the largest symmetric relation 
such that A ~1 © implies 


1. A(S) = 0(S), 


2. for each a € A, whenever A “+c A’, there exists 0! with O “+c OC! 
and A! ~, ©’. 


Secondly, we adapt the bisimulation in [29, 15] for subdistributions. 


Definition 6.5 Let ~2C Dgyp(S) X Deup(S) be the largest symmetric relation 
such that A ~2 © implies, for all finite sets of probabilities {p; | 7 € I} 
satisfying et pe <1, 
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1. |A| = [©], 
2. whenever A “+c A’, there exists ©! with @ +c 0! and A’ ~2 OC’, 
3. whenever A = Yi) ,<, pi: Ai, for any subdistributions A;, there are some 


subdistributions 0; such that O = Vier p,- O; and A; ~2 ©; for each 
ied. 


Thirdly, we adapt the bisimulation given in [20] to pLTSs. A subdistri- 
bution is consistent, if EA(s) = E A(t) for any s,t € [A]. That is, all the 
states in the support of A have the same set of enabled actions. 


Definition 6.6 Let ~3C Dgup(S') X Dsup(S) be the largest symmetric relation 
such that A ~3 © implies 


1. [A] = [8], 
2. whenever A “+c A’, there exists 0! with @ “+c O' and A’ ~3 @’, 


3. if A is not consistent, there exist decompositions A = >); pi: Aj and 
0 = Vier Pi: O; such that A; ~3 0; for eachi € I. 
Finally, we adapt the bisimulation of [31]. Let A be a set of labels. We 


write s “> A if s “sc A for some a € Aand denote by S, = {s| dA. s s A} 
the set of states that can perform some action from A. Then we define a 


transition relation for distributions by letting A +, @ ifs +s A, for each 
s€S,N [A] and O= MK) sean] A(s)- As. 


Definition 6.7 Let ~4C Dgup(S) X Dsup(S) be the largest symmetric relation 
such that A ~4 O implies 


1. |A| =|©| and A(S,) = O(S4) for any AC L, 


2. for each AC L, whenever A ++ A’, there exists ©’ with © + ©’ and 
A’ nw, O. 


The lifting operation given in Definition 6.1 enjoys a few useful properties 
[11, Section 3.3]. 


Proposition 6.1 Let A and © be two subdistributions over S and T, respec- 
tively, and RC Sx T. Then A Ri © if and only if there are two collections 
of states, {s;}ier and {ti}ier, and a collection of probabilities {p;}icer, for 
some finite index set I, such that ¥),-, pi < 1 and A,© can be decomposed 
as follows: 
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1. A= Vier Pi Fi 
2. O= Vier Pi: ti 
8. for eachie I we have s; R t,. 


Proposition 6.2 If RiCR» then (R1)'C(R2)'. 


Proposition 6.3 SupposeRC SxT andy), pi <1. If jer pi: Ai) Rio 
then © = So ,-, pi; for some set of distributions {O;}ie7 such that A; Ri 0; 
for eachie I. 


Proposition 6.4 ~4 C ~q. 
Proof: Let us construct the following relation 
R= {(p-A, p-®)|pe [0,1] AA ~4 OF 


and show that it is a distribution-based bisimulation in the sense of Defi- 
nition 6.3. Suppose (p- A, p- ©) ER for some subdistributions A, O with 
A ~4 © and p € [0,1]. We observe that |p- A] = p- |A| = p- |O| = |p- O}. 
Now let p- A “> A’. It is necessarily the case that A’ = p- A” and 
A 4 A” for some A”. Then for each s € [A] there exists some A, such 
that A” = DsefA] A(s)- A, with 3 “> Ag, ie. either s 4¢ A, or Ay =€ 


if s ->. Note that s “+c A, if and only if s ia, A,. It follows that 


1 1 
——— A(s)- A, = AY 
A(S4a}) (s) 


A {a} Al = 
sESq}N[A] 


Since A ~4 O, there exists some O/” with © ia, 6” and A” ~4 0”. By 


definition E/” must be in the form 
1 


O(Sta}) es 


sESa}N[O] 


with s ia, @,, i.e. s co Og, for any s € Say [OQ]. By taking O, = 
for any s with s ¢ Stay, it follows that © “+ 6” = Y7,-76) O(s) - Os = 
O(S4q}) 0”. We see from A ~4 © that A(S;,}) = O(S;q}). In summary, 
we can infer that 

p-A—p- A" = p-A(Srq3) +A” 

p-O +p: 0" =p-A(S,4) 0” 


Behavioural Pseudometrics for 
Nondeterministic Probabilistic Systems 241 


It follows that from A’” ~4 ©” that (p- A(S¢q})- A”, p- A(S4q}) +O") ER. 
Therefore, we have verified that RC~g, thus ~4C-~gq. 

We now prove that ~gZ~4. Consider the example in Figure 4. From 
state s there is a unique transition s “> A with A = 551 + $52 + 553° This 
can be matched by t ~> ©, where © = 3h + st, because A ~q © holds. 


To see this, we observe that A can initiate three transitions: A “> 35a, 


Aves 235, and A > 55c3 they can be matched by © > 2 ta, 9 2; 2ty, 


and © + ate, respectively. Similarly, the three outgoing transitions from O 
can be matched by the three transitions of A. Therefore, we have verified 
that 5 ~q t. However, we have A 44 ©. From A we have the transition 


1b . . . 
A ie an N= 7 ‘Sat r - 3. From O there are two transitions labelled with 


{a,b}, namely O Aes t, and @ pl ty. Neither of them is able to match 


the transition from A. To see this, we observe that A’(S,,)) = 5 while 
ta( Sy) = 0, and A’ (Sta) a 3 while to(S'¢q}) = 0. It follows that 5 %4 t. 


Figure 4: 3 ~qt but 33 t and 5 44 t. 


Proposition 6.5 Suppose A,O € D(S). Then A ~, © if and only if 
An, 9. 
Proof: Let us define the following two relations: 


Ri = {(A,O)|A,O € Deuw(S) Ap € [0,1 A(A+p-L~1 0 +p-D} 
Ry {(A+p-1,0+p-1)|A,0 € Deup(S) Ap € [0,1] AA ~q O}. 
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We can prove that R,C~g and RgC~ . As an example, let us consider Ry 
and suppose (A,@) €R; with A “> A’. Then there is some probability p 
with A+ p-1~, O+p-L. By Lemma 4.1 A’ = DsefA] A(s)- A, and 


for each s € [A] we have 3 4 Ag, ie. either s “¢ A, or A, = €. Let 
A” = A’ + (|A| —|A’|)- L. It is easy to see that 


At+tp-L— co A" +p-. 
It follows from A+p:-L «1 O+p-L that 
Buegun Sorel ager 


and A” +p-1~; 0” +p-_ for some @”. Observe that 6” must take the 
form @/ + (|O| — |6’|) - L with 0’ = 3° ,<;6) O(s) -@s and for each s € [O] 
either s “+c ©, or O, = €. It follows that OO’, 7 

Since A, O are subdistributions over S and A+ p-1~, 0+ p-1, we 
know that 


|A] = A(S) = (A+p-1)(S) = (+ p- L)(S) = 0(S) =|O|. 
It follows from A” +p: 1 ~; QO” +p-L that 
(A’+ (JA] -|A’])-L+p-1) ~1 (0'+ (19| - |0'])-L+p- 1). 
As a result, we obtain (A’,0’) ER and 
|A’] = AS) = (A’ + ((A] —|A’])- + p- 1)(5) 
= (6' + (|O| — |6"])- L+p- 1)(S) = 0S) = |0'. 


Therefore, we have established that R,C~g. By similar arguments, it can 
be shown that ReC~4. 


Proposition 6.6 (~s)' C (w4)t =.20C%3C ~¢. 


Proof: Since ~{ allows for combined transitions while ~, uses plain 
transitions only, it is obvious that ~,C~/,. By the monotonicity of the lifting 
operation, Proposition 6.2, we can infer that (~s)' C (~4)!. Moreover, the 
inclusion is strict. For example, consider the two point distributions 5 and t 
in Figure 5, we have 3 (~/,)' # but not 3 (~,)' because neither t “> = nor 
t + f matches the transition s > (557 + 552), but a combination of them 
will do. 

Next, let us prove (wit C wg. Suppose A, 9 are two subdistributions 
with A (~4)' ©. By Proposition 6.1 we can decompose A and @ as follows: 
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Figure 5: 3 (wtyt t but not § (~5)! f, 


7 A = Vier Pi 
© O = Vic Pi ti 


e for each i € I we have 5; ~% tj. 


It is obvious that |A|] = $7,-; pi = ||. It remains to check that A and © can 
match each other’s transitions. Suppose A “+c A’. Then A’ = >, ray A(s): 


A,, where for each s € [A] we have s “+c A,. By Proposition 6.3 we can 
decompose © as 


= S° Als): 0, (18) 
sE[A] 
such that 3 (~4)' ©, for each s € [A]. By Proposition 6.1 we can derive 


that s ~/, t, for each t, € [O,]. From s “+c Ag, we can find some matching 
transition ts “+c @;, with A, (~s)' @;, for each t, € [O04]. It follows that 


As (~4)'( $2 Os(és) - O4,)- (19) 


tsE[Os | 


Let ©, = 7:,cro,] Os(ts) Oz, and 8’ = Y,epa] A(s)- 04. Then ©, +c 0% 
is clearly a valid transition for each distribution ©, where s € [A]. Combine 
this with (18), we obtain 

0 sc 0. (20) 
From (19), we derive that 

A’ (m4). (21) 
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By Proposition 6.3, any decomposition of A as }),-;p; - A; can be matched 
by some decomposition of © as }7,-,; pi -O; with A; (~4)' ©; as desired. 
Therefore, we have completed the proof of (~4)'C~». 
In order to prove the other inclusion, ~2 C (~4)', it suffices to construct 
the following relation 
R= {(s,t)|3~2F} 


and show that it is a strong probabilistic bisimulation, which means R C ~4. 
The proof makes use of the property that A ~2 © implies A RI O. 

It is easy to see that ~3 is a relaxation of ~2 by requiring decompositions 
for inconsistent subdistributions rather than for any subdistribution in 
general. Furthermore, ~3 is strictly coarser than ~2. Consider the two 
states s and t in Figure 6. We see that 5 % t because the point distribution 
3] reachable from s is not related to the subdistribution (5 e+ 5 - ta) 
reachable from t if their decompositions need to be compared: the former 
cannot be decomposed into two subdistributions that can mimic f; and fo, 
respectively. However, it is straightforward to check that 3 ~3 f. 


Figure 6: 3$~3t and 3~4t but 5 42 t. 


In [22, Theorem 7.1.1] it is proven that ~3 is strictly included in ~4. 
But Proposition 6.5 says that ~; coincides with ~g. Therefore, ~3 is strictly 
included in ~g. As a matter of fact, it is also not difficult to give a direct 
proof. 
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Proposition 6.7 ~4 is incomparable with the three relations (~;)', ~2 and 
OSB is 


Proof: In [22, Theorem 7.1.2] it is shown that ~3 is incomparable with 

~4. Note that ~4Z~3 implies ~4Z~2 and ~4Z(~,)' by Proposition 6.6. 
Let us consider the diagram in Figure 7. Let A = $ “sy t+ $ - 89. 

Observe that s1 ~, 82 and thus 37 (~,)' A. By Proposition 6.6, we also have 


3] ~2 A and 57 ~3 A. However, we have sj ~%4 A because the transition 


1b owe __ . — {a,b} __ — {a,b} __ 
A 4ab} 553 a 556 can be matched by neither s7 ab}, $3 nor Ss] ple 54, 


the only two {a, b}-labelled transitions from 37. 


Figure 7: 37 (~s)! (5: 
The last four propositions can be recapitulated by the following theorem. 


Theorem 6.1 Figure 8 depicts the relationship between the seven bisimilar- 
ities for distributions mentioned above. 


If we confine ourselves to deterministic pLTSs, then combined transitions 
add nothing new to ordinary transitions and thus ~/, degenerates into ~s, 
but the rest of Figure 8 remains unchanged. 


7 Other Related Work 


Metrics for probabilistic transition systems are first suggested by Giacalone 
et al. [28] to formalize a notion of distance between processes. They are used 
also in [36, 39] to give denotational semantics for deterministic models. De 
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~1=~d >~3 >~-2= (~1)! —> (~,)! 


~4 


Figure 8: Relationship between the seven bisimilarities for distributions. An 
arrow pointing from one relation to another means that the former relation 
is strictly coarser than the latter. Two relations are incomparable if there is 
no path from one to the other. 


Vink and Rutten [9] show that discrete probabilistic transition systems can 
be viewed as coalgebras. They consider the category of complete ultrametric 
spaces. Similar ultrametric spaces are considered by den Hartog in [10]. 
In [51] Ying proposes a notion of bisimulation index for the usual labelled 
transition systems, by using ultrametrics on actions instead of using pseudo- 
metrics on states. A quantitative linear-time-branching-time spectrum for 
non-probabilistic systems is given in [21]. 


Metrics for deterministic systems are extensively studied. Desharnais 
et al. [17] propose a logical pseudometric for labelled Markov chains, which 
is a deterministic model of probabilistic systems. A similar pseudometric is 
defined by van Breugel and Worrell [49] via the terminal coalgebra of a functor 
based on a metric on the space of Borel probability measures. Essentially 
the same metric is investigated in the setting of continuous Markov decision 
processes [26]. The metric of [17, 50, 26] works for continuous probabilistic 
transition systems, while in this work we concentrate on discrete systems with 
nondeterminism. In the future it would be interesting to see how to generalise 
our results to continuous systems. In [48] van Breugel and Worrell present 
a polynomial-time algorithm to approximate their coalgebraic distances. 
Furthermore, van Breugel et al. propose an algorithm to approximate a 
behavioural pseudometric without discount [47]. In [25] a sampling algorithm 
for calculating bisimulation distances in Markov decision processes is shown 
to have good performance. Later on, more efficient algorithms for computing 
probabilistic bisimilarity distances for probabilistic automata have been 
developed in, e.g., [45, 1]. In [7, 8] the probabilistic bisimulation metric on 
game structures is characterised by a quantitative y-calculus. Algorithms 
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for game metrics are proposed in [3, 41]. A notion of bisimulation distance 
for distributions is proposed in [24]. It is defined for full distributions 
only and the definition itself has to be given in terms of fully enabled 
transition systems. Our distribution-based bisimulation metric generalises it 
to subdistributions, and allowing transitions between subdistributions has 
the advantage of allowing our definition to be more direct. 

Metrics for nondeterministic probabilistic systems are considered in [18], 
where Desharnais et al. deal with labelled concurrent Markov chains (similar 
to pLTSs, this model can be captured by the simple probabilistic automata 
of [42]). They show that the greatest fixed point of a monotonic function 
on pseudometrics corresponds to the weak probabilistic bisimilarity of [40]. 
In [27] a notion of uniform continuity is proposed to be an appropriate 
property of probabilistic processes for compositional reasoning with respect 
to d,;. In [44] a notion of trace metric is proposed for pLTSs and a tool 
is developed to compute the trace metric. In [2] the boolean-valued logic 
from [13] is used to characterise state-based bisimulation metrics. It crucially 
relies on distribution formulae of the form @,_; pivi, which is demanding in 
the sense that if A satisfies that formula then there is some decomposition 
A =e pi Ai such that for each i € I all the states in the support of A; 
must satisfy y;. 

Metrics for other quantitative models are also investigated. In [12] a 
notion of bisimulation metric is proposed that extends the approach of [18, 17] 
to a more general framework called action-labelled quantitative transition 
systems. In [6] de Alfaro et al. consider metric transition systems in which 
the propositions at each state are interpreted as elements of metric spaces. 
In that setting, trace equivalence and bisimulation give rise to linear and 
branching distances that can be characterised by quantitative versions of 
linear-time temporal logic [38] and the p-calculus [35]. 


8 Concluding Remarks 


We have considered two behavioural pseudometrics for probabilistic labelled 
transition systems where nondeterminism and probabilities co-exist. They 
correspond to state-based and distribution-based bisimulations. Our modal 
characterisation of the state-based bisimulation metric is much simpler than 
an earlier proposal by Desharnais et al. since we only use two non-expansive 
operators, negation and testing, rather than the general class of non-expansive 
operators. Our modal characterisation of the distribution-based bisimulation 
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metric is new. The characterisations are shown to be sound and complete. 
We have also shown that the distribution-based bisimulation metric is a 
lower bound of the state-based bisimulation metric lifted to distributions. 
In addition, we have compared the bisimilarities entailed by the two metrics 
with a few other distribution-based bisimilarities. 

In the current work we have not distinguished internal actions from 
external ones. But it is not difficult to make the distinction and abstract 
away internal actions so as to introduce weak versions of bisimulation metrics. 
In a finite-state and finitely branching pLTS, the set of subdistributions 
reachable from a state by weak transitions may be infinite but can be 
represented by the convex closure of a finite set [11]. This entails that the 
logical characterisation of weak bisimulation metrics would be similar to 
those presented here. 
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